The assignment of chunk size according to the target data characteristics in deduplication backup system
نویسندگان
چکیده
This paper focuses on the trade-off between the deduplication rate and the processing penalty in backup system which uses a conventional variable chunking method. The trade-off is a nonlinear negative correlation if the chunk size is fixed. In order to analyze quantitatively the trade-off all over the factors, a simulation approach is taken and clarifies the several correlations among chunk sizes, densities and average lengths of the different parts. Then it clarifies to assign an appropriate chunk size based on the data characteristics dynamically is effective to weaken the trade-off and provide higher efficiency than a conventional way.
منابع مشابه
An Efficient Data Deduplication based on Tar-format Awareness in Backup Applications
Disk-based backup storage system is utilized widely, and data deduplication is becoming an essential technique in the system because of the advantage of a spaceefficiency. Usually, user’s several files are aggregated into a single Tar file at primary storage, and the Tar file is transferred and stored to the backup storage system periodically (e.g., a weekly full backup) [1]. In this paper, we ...
متن کاملImproving restore speed for backup systems that use inline chunk-based deduplication
Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques—increasing cache size, container capping, and using a forward assembly area— for alleviating this problem. Container capping is an ingest-time operati...
متن کاملTarget Deduplication Metrics and Risk Analysis Using Post Processing Methods
In modern intelligent storage technologies deduplication of data is a data compression technique used for discarding duplicate copies of repeating data. It is used to improve storage utilization and applied to huge network data transfers to reduce the number of bytes which is to be transferred. In this process, similar data chunks, patterned bytes, are classified and stored at this stage. As a ...
متن کاملA Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
Data Deduplication describes approach that reduces the storage capacity needed to store data or the data has to be transfer on the network. Cloud storage has received increasing attention from industry as it offers infinite storage resources that are available on demand. Source Deduplication is useful in cloud backup that saves network bandwidth and reduces network space Deduplication is the pr...
متن کاملChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory
Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput bottlenecks due to disk I/Os involved in index l...
متن کامل